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Abstract. We present an advanced algorithm for the determination of watershed 
lines on Digital Elevation Models (DEMs) , which is based on the iterative application of 
Invasion Percolation (IIP) . The main advantage of our method over previosly proposed 
ones is that it has a sub-linear time-complexity. This enables us to process systems 
comprised of up to 10 s sites in a few cpu seconds. Using our algorithm we are able to 
demonstrate, convincingly and with high accuracy, the fractal character of watershed 
lines. 

We find the fractal dimension of watersheds to be Df = 1.211 ± 0.001 for artificial 
landscapes, Df = 1.10 ± 0.01 for the Alpes and Df = 1.11 ± 0.01 for the Himalaya. 



PACS numbers: 05.45.Df, 64.60.ah, 91.10.Jf 
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1. Introduction 

The concept of watershed arises naturally in the field of Geomorphology, where it plays 
a fundamental role in e.g. water management [HOE], landslide (H El [7] and flood 
prevention [6, 8, 9J. Moreover, important applications can also be found in seemingly 
unrelated areas such as Image Processing [101 HH [12] and Medicine p3[ [HI [151 EH EE]- 
Watersheds divide adjacent water systems going into different seas and have been used 
since ancient times to delimit boundaries. Border disputes between countries, like 
for example the case of Argentina and Chile [18], have shown that it is important 
to fully understand the subtle geometrical properties of watersheds. Geographers and 
geomorphologists have studied watersheds extensively in the past. There have also 
been preliminary claims about fractality [19] but they were restricted to small-scale 
observations and therefore inconclusive. Despite the far reaching consequences of scaling 
properties on the hydrological and political issues connected to watersheds no detailed 
numerical or theoretical study has yet been performed. 

In particular in image processing or computer vision the interest in development 
of efficient algorithms is high. There one tries to simplify and/or change an images 
representation such that it is more meaningful or easier to analyze. This is done by image 
segmentation, e.g. partitioning a digital image into multiple segments (sets of pixels, 
also known as superpixels) . Typically one locates like this objects and boundaries (lines, 
curves, etc.) in images. More precisely, image segmentation is the process of assigning 
a label to every pixel in an image such that pixels with the same label share certain 
visual characteristics as color, intensity or texture. Adjacent segments are significantly 
different with respect to the same characteristic (s) (T2.J. This sounds familiar and, 
although many other methods, such as clustering, histograms [20], edge-detection [21] , 
region growing [TO], level-set or graph partitioning [22], have been developed for that 
purpose, also watersheds are currently used. This because, if one transforms an image 
to its color gradient representation and calculates the watersheds on the so achieved 
DEM like data, one can identify the obtained catchment basins as the different (color) 
regions and their boundaries to be represented by the watersheds. 

It is the purpose of the present paper to present an advanced method, which allows 
us to study the geometrical properties of watersheds on huge landscapes and with high 
statistics. Applied to an artificial landscape model we show results on length scales 
spanning over three orders of magnitude yielding highly accurate estimates of the fractal 
dimension of watersheds. Our algorithm is based on an iterative invasion-percolation 
(IP) like model. We compare to other algorithms in use and show the advances of our 
method. Furthermore, we prove the equivalence of our IP-based method to the so called 
flooding algorithm. Finally the method is applied to Digital Elevation Maps (DEM) 
from regions including the Alps and the Himalayas. 
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2. The two methods 

Traditional cartographical methods for basin delineation relied on manual estimation 
from iso-elevation lines and required a good deal of guess work. Modern procedures are 
based upon the automatic processing of DEM or Grayscale Digital Images where gray 
intensity is transformed into height. One of the most popular algorithms for watershed 
determination [10] (compare also flooding later) uses rather complicated data structures 
and at least one pass over all pixels in order to calculate watersheds, and is adequate 
for grayscale images, i.e. integer-height spaces. In the following, we will present a much 
more efficient technique. 

Let us consider a DEM with sites i having heights hi. We define the set of sites 
Sk, k = 0,1,... ,N sinks , to be the sinks of our system. On natural landscapes these 
sinks are the natural water-outlets of the terrain, as for example a lake, a river or an 
ocean. We consider that at each time step, water flows to the lowest-lying site on the 
perimeter of a flooded region. If when starting from i, sink Sk is flooded before any 
other, then site i and the whole flooded region are considered to belong to the catchment 
basin of sink Sk, equivalently one says that i drains to Sk- This procedure introduces a 
classification of lattice sites, i.e. a subdivision into non-overlapping subsets whose union 
is the entire lattice. The sets of bonds separating neighboring basins we call watersheds. 
We are interested in the geometrical properties of a single watershed line. But as natural 
landscapes normally contain more than one sink and hence more than one single line, 
it is difficult to study long range properties. We therefore restrict ourselves to study 
two-sink systems, or in other words studying only the main watershed with respect 
to these two sinks. Although we use these restrictions, our algorithm, which we will 
describe in the following, is able to deal also with multiple-sink systems. This because 
it is not restricted to certain number of sinks/labels but is free to deal with as many as 
one would like to. 

In our algorithm, that we call IP-based algorithm, a cluster is started from site i and 
grown by adding, similar to Invasion Percolation (IP), at each step, the smallest-height 
site on its perimeter until the first sink is reached (see figured]). This can be done rather 
quickly when using binary heaps or other tree methods to sort the list of perimeter sites 
according to their height values. This procedure is related to Prim's algorithm for 
growing Minimum Spanning Trees (MST) from % [2^1 [25]. When implemented blindly 
over the whole lattice, however, this process would be inefficient since in principle a new 
cluster has to be grown from each site i. By noting that all sites occupied by a cluster 
at the time the sink is reached also drain to that sink, it becomes clear that we can set 
for each site in the cluster a link to that sink. This leads to two improvements, first, we 
do not need to treat again the sites contained in the cluster as we can directly link them 
to the sink reached by the cluster, and second, we can stop growing clusters already 
when we reach a site which has an existing link to a sink. Like this, the algorithm has 
to visit each site only once. But still each site on the map has to be visited. Let us 
assume we already know a line dividing the system and we want to check if this line 
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is really the watershed of the current system. To prove this we actually only need to 
test the sites adjacent to the watershed line if they belong to different sinks on the two 
sides of the line. Hence the subset of the system, formed by these adjacent sites, is the 
minimal subset we have to test, meaning to grow IP-clusters from them, in order to 
clearly determine the watershed. Unfortunately we do not know in advance the exact 
form or position of the watershed, neither we know any of the sites in that minimal 
subset. The only thing we know is, that the watershed is located somewhere in between 
the two sinks. This means that, whatever path we follow, to go from one sink to the 
other, we have to cross the watershed. The sites we visited on such a path, before the 
watershed traversal, belong to the sink we started from and the sites afterwards belong 
to the second sink. This is what we now can use to find the first two sites of the minimal 
subset. We follow a straight line of sites connecting the two sinks. Starting from one 
sink we follow the connection site by site and grow IP-clusters from these sites. We 
proceed until we reach the first site that drains to the second sink, which means that 
we have crossed the watershed (see figured]). The bond between the last two sites, that 
drain to different sinks, is part of the watershed line and the two sites belong to the 
minimal subset as mentioned above. Now we can reconstruct the minimal subset and 
the set of bonds forming the watershed line by just following step by step the direction 
of the already found watershed at one of its two ends and testing the next two sites to 
find if the direction changes by ±90 degrees or remains the same. We just walk along 
the watershed and hence have only to test slightly more sites than in the minimal subset 
(see figured]). Of course the IP-clusters for these sites have to be grown until they reach 
a sink or sink-connected site, but there are typically large parts of the terrain that need 
not be visited at all by the algorithm in order to determine the entire divide. For use 
in multiple sink systems one would only have to consider additionally branching and 
joining of the watershed lines (also only one starting bond has to be found). That only 
a part of the system has to be considered (compare figure [2]) and there each site only 
once, is probably the biggest advantage of this procedure. The described algorithm is 
fast enough to allow for the determination of watersheds on lattices comprising 10 8 sites, 
in a few CPU seconds on a normal workstation. 

In the following we describe a slightly adjusted version of the commonly used 
watershed algorithm for image segmentation (pi)]) in order to compare efficiency and 
prove equivalency to our IP-based algorithm (actually it is making use of disjoint-set 
data structures). In this procedure, which we call flooding, the whole lattice is flooded 
(or occupied) in order of increasing height, i.e. at time t all sites with hi < h(t) are 
occupied and h(t) grows site per site in time. This procedure is also related to Kruskal's 
algorithm for growing MST's [24"l [25] . It is assumed that each site has a height, which 
is a real number, such that after sorting them, a unique sequence of sites is defined. At 
the beginning, each sink is assigned a different label, and we will furthermore specify 
that a cluster of occupied sites that gets connected (through a path of occupied sites) 
with a sink is labelled accordingly, i.e. labels propagate from the sinks. Clusters of 
occupied sites not yet in contact with any sink remain unlabeled. Whenever two different 
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Figure 1. Different stages during the search for the main watershed (green lines), 
red/blue for the labels to the two sinks, green are visited but not yet labeled sites, 
the numbers mark the integer heights of the corresponding site and the yellow box 
surrounds the starting site of the current IP-cluster, right and left periodic boundary 
conditions are applied. Upper most and lower most bonds connect to the sinks (which 
are entire rows). Further the lowest two sites are already labeled, (a)-(c) shows the 
growth of an IP-cluster until reaching a sink, (a)-(e) search for the first watershed 
bond, growing IP-clusters of each site along the first column consecutively, (e)-(g) 
Following the watershed. 
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Figure 2. Labels of the sites for a random landscape obtained during the run of the 
IP-based algorithm. Red labels (lower part) belong to the sink at the bottom row 
and the blue ones (upper part) to the sink at the upper most row. The watershed is 
marked in yellow. Only the colored sites are visited by the algorithm, meaning that 
white sites have not to be considered by the algorithm in order to obtain the right set 
of watershed bonds. 

labels would get in contact with each other because of the addition of a new site, the 
(unique) bridging site is first labelled with the label of its lowest-lying neighbor. Next all 
bonds connecting sites with different labels are "cut" so that labels from different sinks 
never mix. Bonds connecting different labels are called bridges. When this flooding 
procedure is completed, the different labels identify the corresponding catchment basins 
and the set of bridges, i.e. the set of bonds separating the different basins, identifies the 
watershed(s). In the described form the sinks are predefined by hand or applying some 
criteria. Of course one can reformulate this procedure to identify all intrinsic catchment 
basins automatically by just introducing new labels when a site gets occupied that has 
only non-occupied neighbors. This will in fact return all possible watersheds on the 
current landscape, even of the smallest basins, which is probably not what one would 
like to have. Unfortunately using the above described flooding algorithm one has in 
general to consider each site or pixel to completely determine the watershed, such that, 
already without considering the needed sorting, this algorithm performs less efficient 
than the presented IP-based algorithm. 
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3. Equivalence of Methods 

We now sketch a simple proof for the equivalence of these two procedures. During the 
flooding growth a set of unlabeled sites that become labelled by occupying site b is 
called a lake and b is called its outlet. The height of all sites in a lake is bounded by 
that of its outlet, i.e. h(i) < h(b(i)). Let us stop the flood procedure when site % gets 
labelled for the first time, say with label S^. Now start an IP cluster from % and let it 
grow for as long as it takes until some sink is first occupied. During the growth of this 
cluster, which proceeds by always adding the lowest-lying site on the perimeter, several 
lakes and their outlets may be occupied. We need to demonstrate that no bridge will be 
ever traversed during this IP growth, and therefore that sink Sk will be occupied first. 
During the growth of the IP cluster, a lake will be entirely flooded i.e. the water will 
reach the level of its outlet. Since all bridges by definition are higher than the outlet 
of a lake (because by definition bridges were occupied later than the outlet), clearly the 
next occupation will proceed through the outlet and not through the bridge. Therefore 
no bridge will be ever traversed thus proving the equivalence of both procedures. 

It is important to note that, although conceptually equivalent, the IP and the 
flooding algorithms are markedly different in terms of computational performance. First, 
the flooding needs a sorting of the sites according to their heights, which needs at least 
O(N) time. Additionally it has been shown by Fredman and Saks in [23] that building 
the disjoint-set data structure, which the algorithm is based on, needs in all cases at 
least 0(a(N)N), where a(N) is the inverse Ackermann function, which is constant for 
almost all values of N. Hence the flooding scales at least with O(N). In figure [31 for the 
random artificial landscape case, the number of sites visited by the IP-based algorithm 
is shown together with its scaling in linear dimension L, which clearly points out the 
sub-linear time- complexity of this method. The total number of visited sites iV scales as 
TV ~ L D f with Df = 1.8 ± 0.01. This is comparable to what is found in nature. There, 
river networks show fractal dimensions between 1.7 and 1.9 [26J [27J [28j [29J, [3Ql f3T| . 
On large scales we expect this fractal dimension to be dominated by the largest IP- 
cluster we grow, for which we found a fractal dimension that is in good agreement with 
literature and the result for the total number of visited sites. Furthermore we expect 
that the algorithm is even more efficient on natural landscapes, as the largest IP-cluster 
will most probably follow the main stream. Hence scaling like the main stream, which 
is reported to have fractal dimensions between 1.0 and 1.2 (26J, E7J [28j [29l [301 I3T] . 

4. Results 

As an example we applied the IP-based algorithm to artificial landscapes., where the 
height of each site on the lattice is an independent random variable uniformly distributed 
between and 1 [j], and two sinks are defined respectively as the upper- and lowermost 

t Since the watershed location only depends on the order in which sites are occupied, it is clear that 
any distribution of heights will produce the same statistical results, as discussed for example in |24j . 
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Figure 3. Log-log plot of the number N(L) of sites visited by the IP-based algorithm, 
in order to determine the watersheds on artificial landscapes, as a function of linear 
dimension L (circles). Solid line shows the least squares fit to the five last data points, 
which gives a dimension Df = 1.8 ± 0.01 for large scales (N ~ L Df ). 

lines of the lattice. Due to the high efficiency of the method we could process a huge 
amount of data and hence gather a lot of statistics. This enables us to very precisely 
estimate the fractal dimension of the watershed using the mass-scaling. We measured 
the total length of the resulting watersheds and averaged this value over at least 10 4 
samples for a given lattice size. The results for artificial landscapes are shown in figure 
ID Our data is strikingly straight on a log-log plot, which warrants neglecting finite size 
corrections. Hence, the fractal dimension is measured as the least squares fit to the 
slope of the shown data. We find that the watershed is a fractal, i.e. M ~ L D f , with 
a fractal dimension Df = 1.211 ± 0.001 [§|. This value of Df is close to that found for 
Disordered Polymers (~ 1.2 [32]), "strands" in Invasion Percolation (1.22 ± 0.01 |33j). 
and paths on MST's (1.22±0.01 [21]). The roughness exponent found for the watersheds 
is equal to unity within the statistical error bars, supporting the fact that they are indeed 
self-similar fractal objects and not self-affine. 

We also applied the IP algorithm to study the fractal properties of watersheds 
on naturally occurring landscapes, which clearly have long range correlations [34J. Of 
course, if we would like to estimate the fractal dimension of a single watershed line 
of one given landscape, as is the case for these natural topographies, we cannot work 
with the statistic approach used above for the artificial ones. Therefore we have to 
choose another method. As we follow the watershed bond per bond and hence have 
the correct ordered set of bonds, we can easily apply a yard stick method. Measuring 
now the length of the watershed in terms of a given yard stick size, meaning measuring 
the length on a certain resolution, we can define the corresponding fractal dimension 
again. We checked this procedure also on several realizations of artificial landscapes 

§ We repeated the mass-scaling analysis using the box-counting method on individual samples of 
L = 1.5 x 10 4 , and obtained a consistent result Df = 1.21 ± 0.01. 





Figure 4. The mass M of the watershed of (uncorrelated) random-height maps scales 
with linear dimension asM ~ L Df , with Df = 1.211±0.001. A box-counting procedure 
(inset) gives a consistent result. This fractal dimension is independent of the type of 
disorder, as long as each height is an uncorrelated random variable. 



and found the newly estimated fractal dimension to agree with the above presented 
Df = 1.211 ± 0.001 within the error bars. Therefore we can assume that this equality 
is also true for natural landscapes. Now, using topographic data derived from the 
Shuttle Radar Topography Mission (SRTM) [35], we analyzed several mountainous 
regions and determined their watersheds within the available resolution of 3 arc-sec 
(i.e. roughly 90 m). As shown in figure [5j a yard-stick analysis of the watershed 
performed in the range 100 m < L < 200 km for the Alps and Himalayas gives fractal 
dimensions that are indistinguishable within their error bars, namely Df 1 = 1.10 ±0.01 
and Dj l = 1.11 ± 0.01, respectively. The error bars were obtained by determining 
the lines of maximal and of minimal slope that would still be consistent with the data 
in figure [5j Again the roughness exponent found for the watersheds is equal to unity 
within the statistical error bars, hence we have still self-similar fractal objects and not 
self-affine for intermediate to large scales. The origin of the upper and lower-size scales is 
clear. On large scales (> 200 km) the watershed follows the direction of the main crust 
foldings, which depends on processes occurring on the scale of tectonic plates, which are 
non- fractal. Hence the scaling should be essentially linear (Df = 1). Although beyond 
our resolution, at small scales (< 100 m), below the size of an individual mountain, the 
watershed would connect peaks and troughs, which are typically self-affine. 

It is important to notice that the fractal dimension of the watershed line Df is 
by definition independent of the type of disorder on the landscape, as long as each 
height is an uncorrelated random variable because only the spatial order of the random 
variables (heights) in the system matters and not their relative numerical differences. 




L/5 

Figure 5. Log-log plot showing the box-counting results for the Alps watershed. The 
solid line is the best fit to the data which gives a fractal dimension of D^ 1 — 1 .10±0.01. 
The inset shows our results for the watershed calculated on the Himalayas. The fractal 
dimension in this case is = 1.11 ± 0.01. 

The small discrepancies (around 10%) between fractal dimensions of watersheds in the 
natural topographical data taken from SRTM and the artificial random landscapes can 
be explained by the presence of correlations in the first case. In particular, long-range 
correlations in space have been reported in a large variety of physical, biological and 
geological systems [361 S3 EEJ [391 HQ] . 

5. Conclusions 

We presented an advanced numerical algorithm of sub-linear time-complexity and 
showed its equivalence to a currently used watershed algorithm. Giving some qualitative 
arguments and analysis we pointed out the improved efficiency of the presented method. 
We further investigated the watershed topology of natural and synthetic DEM's. Our 
results show that watersheds generated on very large (10 s sites) uncorrelated random 
landscapes are self-similar with fractal dimension Df = 1.211 ± 0.001. Natural 
watersheds calculated from landscapes of mountainous regions like the Alps and 
Himalayas also display self-similarity, but with slightly smaller fractal dimensions, 
Df 1 = 1.10 ± 0.01 and Df 1 = 1.11 ± 0.01, respectively. The difference between the 
fractal dimensions arises probably due to the fact that long-range correlations exist in 
the natural system. The fractality of watersheds has widespread consequences on its 
susceptibility to perturbations of the topology and transport properties at the boundary 
of catchment basins. These are questions that will be investigated in the future. 
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